-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvstreamer: remove temporary allocations for []Result #82387
Conversation
2a23775
to
0da4952
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass. The optimizations look like they're paying off! Nice work.
Reviewed 2 of 3 files at r12, 3 of 3 files at r14, 1 of 2 files at r15, 1 of 4 files at r16.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rharding6373, and @yuzefovich)
-- commits
line 163 at r13:
Can we avoid the code duplication from reimplementing two sets of heap methods by making a helper that can implement a min heap that works for both?
-- commits
line 201 at r14:
Turning this into a LIFO queue is a neat optimization.
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 271 at r12 (raw file):
func (b *outOfOrderResultsBuffer) doneAddingLocked() { b.Mutex.AssertHeld() b.signal()
How much of a performance hit is it to signal at the end of addLocked every time it's called and remove doneAddingLocked
for reduced complexity?
Edit: I see in a later commit doneAddingLocked
is also used for budgeting purposes, so getting rid of it may not be optimal anyway.
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 271 at r15 (raw file):
func (b *outOfOrderResultsBuffer) doneAddingLocked() { b.Mutex.AssertHeld()
Isn't it still advisable to hold the lock while signaling on the channel?
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 297 at r16 (raw file):
func (b *outOfOrderResultsBuffer) doneAddingLocked(ctx context.Context) { b.accountForOverheadLocked(ctx, int64(cap(b.results))*resultSize)
If I understand this correctly, we can go into memory debt doing this. But in the previous commit we reuse results
, right? So we also never reduce the capacity of results, which may cause us to stay in debt?
pkg/kv/kvclient/kvstreamer/streamer.go
line 1372 at r12 (raw file):
result.memoryTok.toRelease = getResponseSize(get) memoryTokensBytes += result.memoryTok.toRelease s.results.addLocked(result)
I'm a bit concerned about having calls to addLocked
deep in here when the lock is acquired in a conditional above, and that the conditions being the same aren't immediately obvious. The assertion that the lock is held in addLocked
helps, but this seems complicated. Will think about this more, nothing actionable atm.
0da4952
to
a675884
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
Previously, rharding6373 (Rachael Harding) wrote…
Can we avoid the code duplication from reimplementing two sets of heap methods by making a helper that can implement a min heap that works for both?
I don't think we can do this without sacrificing the performance (or, more specifically, without increase in allocations). We could get around this by using generics and then having a single helper, but without the generics, the helper would have to operate on interface{}
which escapes to the heap (leading to allocations), and the current code operates on concrete structs singleRangeBatch
and inOrderBufferedResult
. Another way for having a single helper would be to unify these two structs into one, but that seems extremely dirty.
Thus, I think we should either proceed with the duplicated approach or get rid off the commit altogether. Given the perf wins and reduction in allocations I think the code duplication is worth it.
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 271 at r12 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
How much of a performance hit is it to signal at the end of addLocked every time it's called and remove
doneAddingLocked
for reduced complexity?Edit: I see in a later commit
doneAddingLocked
is also used for budgeting purposes, so getting rid of it may not be optimal anyway.
I actually had the same thought initially - whether it is worth introducing this doneAddingLocked
method to be able to "batch" the results returned on get
call (if we signaled on the channel every time in addLocked
, in theory get
could be returning one result at a time) - and wanted to prototype the approach without doneAddingLocked
. But then I added the memory accounting and this method is very convenient place to do that, so I decided it's probably worth keeping it.
The semantics of addLocked
and doneAddingLocked
seem pretty clean, so I ended up not prototyping what the performance implications / mutex contention would be if we did the memory accounting as well as the signaling on the channel on every addLocked
call.
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 271 at r15 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
Isn't it still advisable to hold the lock while signaling on the channel?
Yeah, we're still holding the lock, and it is checked in accountForOverheadLocked
. I removed it here to avoid the duplication but can bring it back if you think it makes things more clear - there should be no performance impact whatsoever for asserting the mutex is held in regular builds.
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 297 at r16 (raw file):
You're right that the budget can go into debt (since the results buffer doesn't have a push back mechanism, and it would be sad - and more difficult - to drop already completed results). The idea is that the worker coordinator will see that the budget in debt, so it won't issue any more requests, eventually the caller will consume results by GetResults
call, and the budget should get out of debt once those results are Release
d. Note that here we account only for the overhead of the results (i.e. the shallow size of Result
object which is a constant) and not the KV response (which can be arbitrarily large), so it should be relatively unlikely that we get into debt when accounting for the overhead.
But in the previous commit we reuse
results
, right?
The previous commit reuses the results only in the InOrder mode and only for the return value of get
call. In the InOrder mode we need two slices - one is for the buffered results (all results we have received but not returned yet) and another to return on get
call (all the results that we can return according to the ordering). The first slice is reused implicitly since we're maintaining a heap over it (meaning that we're removing from the tail and adding to the tail - and then pushing elements down if needed), the second slice is reused explicitly. Initially I forgot to account for the second slice, fixed.
So we also never reduce the capacity of results, which may cause us to stay in debt?
In theory, I guess it's possible, but in reality this should never happen. This could happen if we're using the InOrder mode, the overhead of buffered
and resultsScratch
is large in comparison to the total budget so that even when we spill results to disk, the overhead (which is always kept in memory) still keeps the budget in debt. I could only see this happen if the budget's limit is too small.
In the OutOfOrder mode the capacity of results
can reduce since we are not reusing the slice after get
call. (I have thought about trying to reuse it but it's quite complex given that we don't know when results get Release
d - we might have concurrent addLocked
call to add a new result into the reused slice before the corresponding result has been processed by the caller.)
pkg/kv/kvclient/kvstreamer/streamer.go
line 1372 at r12 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
I'm a bit concerned about having calls to
addLocked
deep in here when the lock is acquired in a conditional above, and that the conditions being the same aren't immediately obvious. The assertion that the lock is held inaddLocked
helps, but this seems complicated. Will think about this more, nothing actionable atm.
I agree that the structure of the code doesn't make it clear that the right locks are held, but I'm not sure how to improve it - any suggestions are very welcome. I did add some test assertions though.
a675884
to
78308b0
Compare
Added another commit. At the moment of writing, the first commit is part of a different PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/kv/kvclient/kvstreamer/streamer.go
line 1372 at r12 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I agree that the structure of the code doesn't make it clear that the right locks are held, but I'm not sure how to improve it - any suggestions are very welcome. I did add some test assertions though.
Drive-by waiting for some code to compile. You've effectively got the same conditional inside this function twice, once at the top, once at the bottom, and then you've got an assumption about its value in the middle. This makes me think one way to make it cleaner is to split it into two different functions, perhaps processEmptySingleRangeResults
and what you've got where you check the condition at the top and call down to the special case and then the logic in the body of the current function can assume you need to lock and make that clear etc. Maybe there will be a touch of duplication, but it seems worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/kv/kvclient/kvstreamer/streamer.go
line 1372 at r12 (raw file):
Previously, ajwerner wrote…
Drive-by waiting for some code to compile. You've effectively got the same conditional inside this function twice, once at the top, once at the bottom, and then you've got an assumption about its value in the middle. This makes me think one way to make it cleaner is to split it into two different functions, perhaps
processEmptySingleRangeResults
and what you've got where you check the condition at the top and call down to the special case and then the logic in the body of the current function can assume you need to lock and make that clear etc. Maybe there will be a touch of duplication, but it seems worth it.
Drive-bys are always welcome!
I read over your suggestion several times, but I'm not sure I follow. Are you suggesting splitting out a single loop over br.Responses
into two loops, each in a separate function (one would process responses for which we create results, and another would process responses with the resume spans which we add into the resume request)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/kv/kvclient/kvstreamer/streamer.go
line 1372 at r12 (raw file):
Are you suggesting splitting out a single loop over br.Responses into two loops, each in a separate function (one would process responses for which we create results, and another would process responses with the resume spans which we add into the resume request)?
Yep. See how these two commits make you feel: https://github.com/yuzefovich/cockroach/compare/streamer-optimizations...ajwerner:streamer-optimizations?expand=1
d384520
to
03d3286
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took up Andrew's suggestion and included into a new commit. I also removed most of the commits from this PR for ease of reviewing. The first commit is still part of another PR and should be ignored here. PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rharding6373)
pkg/kv/kvclient/kvstreamer/streamer.go
line 1372 at r12 (raw file):
Previously, ajwerner wrote…
Are you suggesting splitting out a single loop over br.Responses into two loops, each in a separate function (one would process responses for which we create results, and another would process responses with the resume spans which we add into the resume request)?
Yep. See how these two commits make you feel: https://github.com/yuzefovich/cockroach/compare/streamer-optimizations...ajwerner:streamer-optimizations?expand=1
Makes sense, thanks for typing this out. I was a bit concerned about the performance impact of having an additional loop (since on each request we perform pretty expensive type switches and interface dispatching), but the microbenchmarks don't reveal any perf hit, so I think the cleanup offered by this change is worth it.
03d3286
to
ded11a4
Compare
Both commits are relevant to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few thoughts below but they're not important.
Reviewed 1 of 1 files at r26, 9 of 9 files at r28, 1 of 9 files at r29, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @michae2, @rharding6373, and @yuzefovich)
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 271 at r29 (raw file):
func (b *outOfOrderResultsBuffer) doneAddingLocked() { b.Mutex.AssertHeld() b.signal()
nit: Now that these are separate methods, you might consider guarding this signal with if len(b.results) > 0
.
pkg/kv/kvclient/kvstreamer/streamer.go
line 1312 at r29 (raw file):
s.mu.avgResponseEstimator.update( fp.memoryFootprintBytes, int64(fp.numGetResults+fp.numScanResults), )
suggestion: How about doing this at the end of this function, with int64(fp.numGetResults + someOtherCounter)
where someOtherCounter
is incremented whenever result.scanComplete
is true?
Oh, you're trying to avoid holding the streamer mutex for the whole function. 🤔 Well, never mind.
Oh, this value is already in resultsBufferBase.numCompleteResponses
. Maybe the avgResponseEstimator
should be part of resultsBufferBase
?
Anyway, not super important.
Code quote:
// TODO(yuzefovich): some of the responses might be partial, yet the
// estimator doesn't distinguish the footprint of the full response vs
// the partial one. Think more about this.
s.mu.avgResponseEstimator.update(
fp.memoryFootprintBytes, int64(fp.numGetResults+fp.numScanResults),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @michae2, @rharding6373, and @yuzefovich)
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 48 at r29 (raw file):
// moment. The boolean indicates whether all expected Results have been // returned. Must be called without holding the budget's mutex. get(context.Context) (_ []Result, allComplete bool, _ error)
Random thought: if you want to optimize further (and make the streamer even more like io_uring) you could change get
to return a pointer to a single Result
rather than returning []Result
(or turn it into some kind of result-by-result iterator).
This commit refactors the code which processes the batch response in order to separate out two different concerns: - processing non-empty responses in order to create `Result`s - processing incomplete responses to populate the resume request. Now each of these "concerns" is handled in a separate loop making it easier to reason about, especially so in the following commit. This commit also extracts out multiple return arguments of a function into a struct as well as updates some of the comments and moves some of the error-checking to an earlier stage. Release note: None
Previously, the worker goroutine would accumulate all `Result`s it can create based on the KV response into a slice, and then the slice would be passed into the results buffer. At that point, the slice would be discarded since the results buffer would copy all `Result`s into its internal state. This commit refactors the streamer as well as the results buffer to avoid this temporary allocation of `[]Result`. The main idea is that `Result`s are now passed one-by-one into the results buffer. The worker goroutine now acquires the results buffer's mutex, processes the KV responses one at a time, and whenever a `Result` is created, it is added into the results buffer right away. However, in order to prevent the results buffer from eagerly returning a single `Result` on `GetResults` call, the streamer's user goroutine won't be woken up, until a newly introduced `doneAddingLocked` method is called by the worker goroutine. Some care needs to be taken to prevent deadlocks with all of the mutexes. Now, since we're finalizing the results one at a time, we might need to hold the streamer's mutex (so that we can set `scanComplete` correctly), and that mutex must be acquired before the results buffer's one. This change shows a modest improvement on the microbenchmarks but is a lot more important on analytical, TPCH-like queries, where this `[]Result` is one of the largest sources of garbage. ``` name old time/op new time/op delta IndexJoin/Cockroach-24 5.98ms ± 1% 5.95ms ± 1% ~ (p=0.079 n=9+10) IndexJoin/MultinodeCockroach-24 7.55ms ± 1% 7.59ms ± 1% +0.47% (p=0.015 n=8+9) IndexJoinColumnFamilies/Cockroach-24 8.68ms ± 3% 8.56ms ± 2% ~ (p=0.133 n=9+10) IndexJoinColumnFamilies/MultinodeCockroach-24 11.8ms ± 5% 11.7ms ± 3% ~ (p=0.315 n=10+10) LookupJoinEqColsAreKeyNoOrdering/Cockroach-24 6.67ms ± 1% 6.69ms ± 1% ~ (p=0.315 n=10+9) LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24 7.87ms ± 1% 7.92ms ± 1% +0.73% (p=0.015 n=10+10) LookupJoinEqColsAreKeyOrdering/Cockroach-24 9.30ms ± 2% 9.31ms ± 4% ~ (p=0.796 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 10.9ms ± 4% 10.9ms ± 2% ~ (p=0.971 n=10+10) LookupJoinNoOrdering/Cockroach-24 8.99ms ± 1% 9.03ms ± 4% ~ (p=0.549 n=9+10) LookupJoinNoOrdering/MultinodeCockroach-24 12.1ms ± 4% 11.9ms ± 6% ~ (p=0.143 n=10+10) LookupJoinOrdering/Cockroach-24 10.9ms ± 3% 10.8ms ± 3% ~ (p=0.243 n=10+9) LookupJoinOrdering/MultinodeCockroach-24 14.2ms ± 5% 13.9ms ± 3% ~ (p=0.113 n=10+9) name old alloc/op new alloc/op delta IndexJoin/Cockroach-24 1.36MB ± 1% 1.31MB ± 0% -3.61% (p=0.000 n=10+9) IndexJoin/MultinodeCockroach-24 2.07MB ± 2% 2.04MB ± 3% ~ (p=0.063 n=10+10) IndexJoinColumnFamilies/Cockroach-24 1.43MB ± 1% 1.38MB ± 0% -3.56% (p=0.000 n=9+9) IndexJoinColumnFamilies/MultinodeCockroach-24 2.27MB ± 1% 2.22MB ± 2% -2.09% (p=0.000 n=8+10) LookupJoinEqColsAreKeyNoOrdering/Cockroach-24 1.71MB ± 0% 1.67MB ± 0% -2.70% (p=0.000 n=9+10) LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24 2.43MB ± 5% 2.35MB ± 1% -3.31% (p=0.000 n=10+10) LookupJoinEqColsAreKeyOrdering/Cockroach-24 1.72MB ± 1% 1.62MB ± 1% -6.20% (p=0.000 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 2.39MB ± 2% 2.30MB ± 3% -3.53% (p=0.000 n=10+10) LookupJoinNoOrdering/Cockroach-24 1.79MB ± 1% 1.74MB ± 1% -2.80% (p=0.000 n=10+9) LookupJoinNoOrdering/MultinodeCockroach-24 2.35MB ± 3% 2.32MB ± 2% ~ (p=0.079 n=10+9) LookupJoinOrdering/Cockroach-24 1.63MB ± 1% 1.53MB ± 1% -5.77% (p=0.000 n=10+10) LookupJoinOrdering/MultinodeCockroach-24 2.30MB ± 4% 2.23MB ± 2% -3.41% (p=0.002 n=9+8) name old allocs/op new allocs/op delta IndexJoin/Cockroach-24 7.15k ± 1% 7.16k ± 1% ~ (p=0.888 n=10+9) IndexJoin/MultinodeCockroach-24 11.9k ± 2% 11.9k ± 2% ~ (p=0.968 n=10+9) IndexJoinColumnFamilies/Cockroach-24 11.9k ± 0% 11.9k ± 0% ~ (p=0.075 n=9+10) IndexJoinColumnFamilies/MultinodeCockroach-24 17.6k ± 1% 17.5k ± 1% ~ (p=0.566 n=10+10) LookupJoinEqColsAreKeyNoOrdering/Cockroach-24 9.86k ± 1% 9.88k ± 1% ~ (p=0.150 n=9+10) LookupJoinEqColsAreKeyNoOrdering/MultinodeCockroach-24 14.1k ± 0% 14.1k ± 1% ~ (p=0.055 n=8+10) LookupJoinEqColsAreKeyOrdering/Cockroach-24 12.6k ± 1% 12.5k ± 1% -0.77% (p=0.005 n=10+10) LookupJoinEqColsAreKeyOrdering/MultinodeCockroach-24 17.2k ± 1% 17.0k ± 0% -0.88% (p=0.000 n=10+8) LookupJoinNoOrdering/Cockroach-24 12.3k ± 1% 12.3k ± 1% ~ (p=0.929 n=10+10) LookupJoinNoOrdering/MultinodeCockroach-24 16.8k ± 1% 16.8k ± 1% ~ (p=0.968 n=9+10) LookupJoinOrdering/Cockroach-24 14.5k ± 1% 14.5k ± 1% ~ (p=0.271 n=10+10) LookupJoinOrdering/MultinodeCockroach-24 19.4k ± 1% 19.3k ± 1% ~ (p=0.056 n=9+8) ``` Release note: None
ded11a4
to
6343df3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @michae2, and @rharding6373)
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 48 at r29 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Random thought: if you want to optimize further (and make the streamer even more like io_uring) you could change
get
to return a pointer to a singleResult
rather than returning[]Result
(or turn it into some kind of result-by-result iterator).
I think I did prototype such an idea at some point (hoping to remove the allocation needed for []Result
in the InOrder mode as well as to reuse the results
slice in the OutOfOrder mode), but it showed a regression in performance. Added a TODO to try that again.
pkg/kv/kvclient/kvstreamer/results_buffer.go
line 271 at r29 (raw file):
Previously, michae2 (Michael Erickson) wrote…
nit: Now that these are separate methods, you might consider guarding this signal with
if len(b.results) > 0
.
Hm, I think I'll keep it the way it is - the only downside of signaling when there are no results (which shouldn't really happen) is that the streamer's user gorouting will wake up in Streamer.GetResults
only to go back to sleep, meaning there is little harm.
pkg/kv/kvclient/kvstreamer/streamer.go
line 1312 at r29 (raw file):
you're trying to avoid holding the streamer mutex for the whole function.
Yeah, I tried to optimize the locking pattern so that we don't hold locks for longer than needed and don't acquire the same lock multiple times inside of the same function.
this value is already in
resultsBufferBase.numCompleteResponses
.
Not exactly - fp.numGetResults+fp.numScanResults
is the number of Result
objects created based on a single BatchRequest that targeted a single range whereas resultsBufferBase.numCompleteResponses
tracks the number of "complete" requests (which could be spanning multiple ranges) for the current Enqueue batch of requests. numCompleteResponses
is reset across Streamer.Enqueue
calls whereas the avg response size estimator is updated throughout the lifetime of the Streamer. So avgResponseEstimator.numResponses
does equal resultsBufferBase.numCompleteResponses
, but only before the second call to Enqueue
.
Maybe the
avgResponseEstimator
should be part ofresultsBufferBase
?
Hm, it's an interesting question. I do see some similarities in these components (e.g. that they handle the results from the BatchResponses), but I think it's cleaner to keep them apart. I think in the future (e.g. when we hook up the Streamer to power regular scans - i.e. TableReaders / cFetchers) the avg response estimator will need to be more sophisticated and it'll probably be better if that complicated logic doesn't make the code around the results buffer more complicated too.
Build failed: |
Seems like acceptance flake. bors r+ |
Build succeeded: |
kvstreamer: refactor the loop of processing the batch response
This commit refactors the code which processes the batch response in
order to separate out two different concerns:
Result
sNow each of these "concerns" is handled in a separate loop making it
easier to reason about, especially so in the following commit.
This commit also extracts out multiple return arguments of a function
into a struct as well as updates some of the comments and moves some of
the error-checking to an earlier stage.
Release note: None
kvstreamer: remove temporary allocations for []Result
Previously, the worker goroutine would accumulate all
Result
s itcan create based on the KV response into a slice, and then the
slice would be passed into the results buffer. At that point, the
slice would be discarded since the results buffer would copy all
Result
s into its internal state.This commit refactors the streamer as well as the results buffer to
avoid this temporary allocation of
[]Result
. The main idea is thatResult
s are now passed one-by-one into the results buffer. The workergoroutine now acquires the results buffer's mutex, processes the KV
responses one at a time, and whenever a
Result
is created, it is addedinto the results buffer right away. However, in order to prevent the
results buffer from eagerly returning a single
Result
onGetResults
call, the streamer's user goroutine won't be woken up, until a newly
introduced
doneAddingLocked
method is called by the worker goroutine.Some care needs to be taken to prevent deadlocks with all of the
mutexes. Now, since we're finalizing the results one at a time, we might
need to hold the streamer's mutex (so that we can set
scanComplete
correctly), and that mutex must be acquired before the results buffer's
one.
This change shows a modest improvement on the microbenchmarks but is
a lot more important on analytical, TPCH-like queries, where this
[]Result
is one of the largest sources of garbage.Addresses: #82160.
Release note: None